Working with Strings - Part 3

© 2006, Brad Moore

Liberty Basic Connection

Home

Tip Corner: ByRef

API Corner: File Download

Working with Strings 3

Stylebits Corner: Dialogs

Eddie's Lessons, v.10

Liberty Basic Wiki

Preprocessor 1

Preprocessor 2

Find Folder

Multiple Listboxes

Newsletter help

Index


Introdcution

Strings are of course a series of single characters. In one sense you can think of these as an array of characters. Using the MID$ function we can index into that array of characters. Remember from the first Strings article, the MID$ function takes three arguments:

MID$(string$, index, n)

string$ = the source string

index = the starting character in the string to begin the extraction

n = the number of characters to extract

Treating the string like an array, we can easily spin through a whole string, printing each character individually using a FOR-NEXT loop and the MID$ function. That works like this:


a$ = "My new red wagon is very shiny!"

for x = 1 to len(a$)
print mid$(a$,x,1)
next x

The trick to making the program see every character in the string is to know how long the string is. As you may recall from earlier discussions, this is done using the LEN function. We embedded the LEN function directly into our FOR-NEXT loop in the code above. This causes the loop to run from the first character (character number 1) to the last character in the string. This way we do not have to know the length of the string to spin through the whole string.

In a recent project I needed to remove all the double quotes from strings that were imported into my program from a file. I did the file import one line at a time using the LINE INPUT statement, so I had to evaluate many strings of varying lengths. The code we wrote above can be easily adapted to accomplish this need.


a$ = "My new " + chr$(34) + "red" + chr$(34) + " wagon is very shiny!"

print "Original: ";a$
a$ = removeQuotes$(a$)
print "New: ";a$

function removeQuotes$(src$)
for x = 1 to len(src$)
if mid$(src$,x,1) <> chr$(34) then out$ = out$ + mid$(src$,x,1)
next x
removeQuotes$ = out$
end function

Notice in the function that if the character being examined (using the MID$ function) is NOT a quote (ascii 34 as you will recall from part 2 of this series), then we add the character to a buffer that contains our return string. This causes the function to effectively skip the quote characters when encountered. Run the code and check it out!

If you want to replace the quote character with another character, it would require a slightly more complex IF-THEN statement. Here is an example that replaces the comma character with a space character:


function replaceComma$(src$)
for x = 1 to len(src$)
if mid$(src$,x,1) = "," then
  out$ = out$ + " "
else
  out$ = out$ + mid$(src$,x,1)
end if
next x
replaceComma$ = out$
end function

String parsing is one of the most complex things most people will do programming. Every parsing need is slightly different from the last one. I recently completed a little project that had to manage its own settings file. I created some simple rules for the settings file, but since it was likely the settings file would be (and should be) edited by an outside party, I had to carefully parse it as it was read.

The rules were simple:

  1. Every blank line is ignored
  2. Comments begin with ##
  3. Every setting is enclosed in a pair of brackets
  4. Any text following the closing bracket on same line as he setting was ignored (put comments there)
  5. The setting value must follow the setting on the next line
  6. End the setting settings file with [end] - no other lines are read after this.

The program to process this simple set of specifications is pretty complex. Consider what you must do to process this:

Lets dig into this:

First we will need a working dat file to play with and test. Here is my sample dat file:


## This is a sample settings file
## These are comments - blank lines are skipped

[workingdir] This is the working directory
c:\lb4

[name] Name of the program
Cool Stuff Demo

[end]

You can copy this to a working directory, or you can get the actual dat file from the Newsletter archive for this issue.

First open the file (we called it settings.dat). We will set a flag that tells us we are not expecting a setting value. We will also check to make sure there is something in the dat file:


open "settings.dat" for input as #1
sval = 0

if eof(#1) < 0 then
    'there is nothing in the file - just quit
    end
end if

Now we will use a simple loop to read the file using the LINE INPUT function:


'set up a loop to read the file
do
    line input #1, a$
    print a$

loop while eof(#1) = 0
close #1
end

The simple skeleton reads the file, one line at a time, prints that line out to the console and reads the next. This proves we can get the input text. Now we need to start parsing it. Lets handle comments and blank lines:

We are going to slowly replace the body of this loop. We are handling blank lines and comment lines by ignoring them. That is pretty easy to do. Examine this code that is intended to replace the print statement in the loop above:


    lenght = len(a$)
    if lenght > 0 then
        'if this is not a blank line, process it...
        if left$(a$,2) <> "##" then
            'this is not a comment - process it...
            'print what is left for now
            print a$
        end if
    end if

If you put it all together and run it against the test file you will get the following output:


[workingdir] This is the working directory
c:\lb4
[name] Name of the program
Cool Stuff Demo
[end]

We need to handle the END condition. Do that by inserting the following code one line before the print a$ statement:


            if lower$(a$) = "[end]" then exit do

This leaves us with the real parsing job. It is really not overly complex for our example. We know we are either looking for a setting or a setting value. Our rules require a setting precede the value. We set a flag called "sval" to zero. When this is zero we are looking for a setting. If it is a one we are looking for a setting value. If they come out of order then we will say there was an error.

We know that settings always start with a bracket. In the real program I validated my setting values against a master table of valid settings. In this case we will not be adding that level of complexity. We will say that if sval = 0 then we are looking for the first character to be a "[" - anything else is an error.

The loop is getting pretty complex. Here is what it looks like incorporating this logic:


'set up a loop to read the file
do
    line input #1, a$
    lenght = len(a$)
    if lenght > 0 then
        'if this is not a blank line, process it...
        if left$(a$,2) <> "##" then
            'this is not a comment - process it...
            'print what is left for now
            if lower$(a$) = "[end]" then exit do
            if sval = 0 then
               if left$(a$,1) <> "[" then
                  print "there was an error - setting expected"
                  end
               else
                  'parse the setting from the text string
                  print "setting: ";a$
                  sval = 1
               end if
            end if
        end if
    end if

loop while eof(#1) = 0

Note that if sval = 0 and you have a valid setting, then we print the setting (currently) and set sval equal to 1. We will need to rest it to keep reading more settings, but that comes later.

The first mater at hand is parsing the setting string once we know it is valid. We can apply some of the techniques we have been discussing. We need to locate the closing bracket (if it is not there we have an error), then extract the setting from between them. We can locate the closing bracket with the INSTR function.

Use:


pos = INSTR(a$,"]",1)

Using that position stored in pos we can calculate the part of the string to extract to get the setting name. Here is that portion of the parsing loop:


            if sval = 0 then
               if left$(a$,1) <> "[" then
                  print "there was an error - setting expected"
                  end
               else
                  'parse the setting from the text string
                  pos = INSTR(a$,"]",1)
                  if pos = 0 then
                     print "error parsing setting - improper format"
                     end
                  else
                     'we can now extract the setting name
                     setting$ = mid$(a$,2,pos-2)
                     print "setting: ";setting$
                     sval = 1
                  end if
               end if
            end if

What remains is to handle the setting value. Since it can be any string we will simply echo it to the console. We will also change the sval flag back to a zero so we can process another pair of settings.

It only took three additional lines of code. Here is the entire parsing program:


open "settings.dat" for input as #1
sval = 0

if eof(#1) < 0 then
    'there is nothing in the file - just quit
    end
end if

'set up a loop to read the file
do
    line input #1, a$
    lenght = len(a$)
    if lenght > 0 then
        'if this is not a blank line, process it...
        if left$(a$,2) <> "##" then
            'this is not a comment - process it...
            'print what is left for now
            if lower$(a$) = "[end]" then exit do
            if sval = 0 then
               if left$(a$,1) <> "[" then
                  print "there was an error - setting expected"
                  end
               else
                  'parse the setting from the text string
                  pos = INSTR(a$,"]",1)
                  if pos = 0 then
                     print "error parsing setting - improper format"
                     end
                  else
                     'we can now extract the setting name
                     setting$ = mid$(a$,2,pos-2)
                     print "setting name : ";setting$
                     sval = 1
                  end if
               end if
            else
               print "setting value: ";a$
               sval = 0
            end if
        end if
    end if

loop while eof(#1) = 0
close #1
end

Notice how the program, in spite of its complexity, does not use a single GOTO. This is structured code. With all the if-then statements it can make reading the code a challenge at times, but it is clean and easy to maintain.

These are the many challenges of working with strings. Parsing can be a tricky but the main hurdle is to really understand your data. If you know it well and what you want to do with it, you can parse it.

There are more articles on parsing in the newsletter. I encourage you to check them out. This concludes the working with strings series.

Thanks - Brad


Home

Tip Corner: ByRef

API Corner: File Download

Working with Strings 3

Stylebits Corner: Dialogs

Eddie's Lessons, v.10

Liberty Basic Wiki

Preprocessor 1

Preprocessor 2

Find Folder

Multiple Listboxes

Newsletter help

Index